Introduction to INLA

1 INLA

In Bayesian inference, one typically seeks to carry out the calculation of the posterior distribution of the parameters that make up a certain model from the available information and assumptions about how the distribution prior to the inference process (a priori distribution) would behave. The typical approach is to employ some MCMC method, such as gibbs or metropolis hasting; however, relatively recently more efficient approaches have emerged.

Laplace Integrated Nested Approach or INLA is a relatively recent method of fitting Bayesian models. The INLA approach aims to solve the computational difficulty of MCMC in data-intensive problems or complex models. In many applications, the posterior distribution sampling process using MCMC can take too long and is often not even feasible with existing computational resources.

2 Descriptive analysis

First of all, we load all the needed packages to do this workshop and the data

library(kableExtra)
library(tidyverse)
library(INLA)
library(yardstick)
library(gt)
library(innovar)
library(spdep)
library(reshape2)
library(viridis)

db       <- readRDS("db_excess_proc_dis.rds")

# Total count of deaths 
db.tl    <- readRDS("db_excess_proc_agr.rds")  

we proceed to do a temporal descriptive analysis, to check the evolution of the number of deaths we employ ggplot package.

db.tl %>% 
ggplot()+
geom_line(aes(x=date,y=n),color="#aa3d01",lwd=0.5)+
geom_point(aes(x=date,y=n),color="#aa3d01",lwd=2.5,colour="#aa3d01",shape=21)+
ylab("Numero de muertes")+
xlab("")+
ggtitle("Evolution of the number of deaths")+
theme_bw()+
theme(plot.title = element_text(hjust = 0.5,size = 35),
      strip.background =element_rect(fill="#850503"),
      strip.text = element_text(color = "white",face = "bold",size = "5pts")) 

similarly, we can check the above in a spatial context, in order to do that we load a shapefile that contain all the geographic information of Peru and joint with our data :

#load shapefile
data("Peru")

db.sp.peru <- Peru                     %>% 
              group_by(reg,prov)       %>% 
              summarise()              %>% 
              left_join( db            %>% 
              group_by(year,reg,prov)  %>% 
              summarise(n=mean(n)))    

later we can graph using ggplot again

#plot 


db.sp.peru %>% 
ggplot()+
geom_sf(aes(fill=n))+
theme_bw() +
facet_wrap(vars(year)) + 
scale_fill_viridis(name="Average number\n of cases",direction = -1,option = "rocket")+
theme(plot.title = element_text(hjust = 0.5,size = 30),
      strip.background =element_rect(fill="#aa3d01"),
      strip.text = element_text(color = "white",face = "bold",size = "5pts")) 

2.1 Creating a model with INLA

To carry out the adjustment of models using the INLA approach we can use the inla() function, which has several parameters, some of the most important are :

  • data : An object typically of the class data.frame, data to adjust any model.

  • formula : Un objeto de la clase formula que especifica la ecuacion que pretendemos ajustar como por ejemplo inla(target ~ 1 + x1). En la formula podemos especificar efectos lineales (introduciendo el nombre de la variable) o no lineales (empleando f()).

  • verbose : A variable of the type boolean, which indicates if you want to show the convergence process in the console

peru.m0 <- inla(n ~ 1 + annus,
                  verbose         = F,
                  data            =  db
                  )

The parameters detailed above are the essential ones to execute the adjustment of a model using INLA. However, some extra parameters to consider are the following:

  • family: Class object character. This parameter is crucial, as it determines the distribution of the target variable, by default it is infamily = Gaussian.
  • control.compute:Object of class list allows to specify the calculation of information criteria such asaic, dic,waic.
peru.m1 <- inla(n ~ 1 + annus,
                  verbose           = F,
                  data              =  db,
                  family            = "Gaussian",
                  control.compute   = list(dic=TRUE, waic=TRUE,cpo=TRUE),
                  control.predictor = list(link = 1),
                  )

In addition, we can use more variables in the linear predictor or employ other assumption of the distribution of the target variable in order to obtain a better model. Given we are modelling a count process , we’ll use a negative binomial distribution as assumption and employ socio-economic and climatic variables that can explain the variability of the number of deaths in Peru.

peru.m2 <- inla(n ~ 1 + annus + temperature+ INEI_prop_aseg_2017 + prop_pobreza_2018 + INEI_Prop_NoAlumbradoElectri_2017 + INEI_prop_NoAbsAguVvn_int_2017,
                  verbose           = F,
                  data              =  db,
                  family            = "nbinomial",
                  control.compute   = list(dic=TRUE, waic=TRUE,cpo=TRUE),
                  control.predictor = list(link = 1),
                  )

Until now, we have only considered linear effects but with INLA we can also model non-linear effects that can control for temporal and spatial effects

2.1.1 Temporal Effects

  • AR1 (1st order autoregressive process) : f(variable, model = "ar1")
  • RW1 (1st order random walk) : f(variable, model = "rw1")
  • RW2 (2nd order random walk) : f(variable, model = "rw2")
# Assuming an a priori Distribution for the years: "iid"

peru.m3 <- inla(n ~ 1 + annus + temperature+ INEI_prop_aseg_2017 + prop_pobreza_2018 + INEI_Prop_NoAlumbradoElectri_2017 + INEI_prop_NoAbsAguVvn_int_2017,
                  control.compute = list(dic=TRUE, waic=TRUE,cpo=TRUE),
                  control.predictor = list(link = 1),
                  verbose         = F,
                  family          = "nbinomial",
                  data            =  db
                  )

# Assuming an a priori Distribution for the weeks: "rw1"
peru.m4 <- inla(n ~ 1 + annus + f(week,model="rw1") + temperature+
                    INEI_prop_aseg_2017 + prop_pobreza_2018 + INEI_Prop_NoAlumbradoElectri_2017 + INEI_prop_NoAbsAguVvn_int_2017,
                  control.compute = list(dic=TRUE, waic=TRUE,cpo=TRUE),
                  control.predictor = list( link = 1),
                  verbose         = F,
                  family          = "nbinomial",
                  data            =  db
                  )
# both priors at same time 
peru.m5 <- inla(n ~ 1  + f(annus,model="ar1")+ f(week,model="rw1")+ temperature+ 
                          INEI_prop_aseg_2017 + prop_pobreza_2018 + INEI_Prop_NoAlumbradoElectri_2017 + INEI_prop_NoAbsAguVvn_int_2017,
                  control.compute = list(dic=TRUE, waic=TRUE,cpo=TRUE),
                  control.predictor = list( link = 1),
                  verbose         = F,
                  family          = "nbinomial",
                  data            =  db
                  )

2.1.2 Spatial Effects

  • besag spatial effect :f(area, model = "besag")
  • proper besag spatial effect :f(area, model = "besagproper")

To include these priors as spatial effects, first we need to collect the geographic information of Peru expressed in a shapefile, at the provincial level. For which, in addition to calling the shapefile, by grouping the data at the provincial level and using the summarize function, we seek to condense the polygons of the shapefile to that level

data("Peru")

peru.prov<-Peru               %>% 
           group_by(reg,prov) %>% 
           summarise()

Subsequently, we create a neighborhood structure from the shapefile, and in turn, from said neighborhood structure, we create the spatial weight matrix

# Creando la estructura de vecinos mas cercanos
peru.adj    <- poly2nb(peru.prov)
# Pesos espaciales
W.peru <- nb2mat(peru.adj, style = "W") 

Finally, we carry out the cleaning of non-existent provinces; as well as the creation of an id for each province, in order to identify each polygon. The latter represents an extra requirement to fit spatial models in INLA

db.peru.sp<- db                             %>% 
             group_by(prov)                 %>% 
             mutate(id.sp=cur_group_id())

Now we are ready to fit a model which effects by province are correlated spatially :

peru.m6   <- inla(n ~ 1 + annus + 
                            f(week,model="rw1")       +
                            f(id.sp, model = "bym", 
                            graph=W.peru)             +
                            temperature  + INEI_prop_aseg_2017 + prop_pobreza_2018 + 
                            INEI_Prop_NoAlumbradoElectri_2017 + INEI_prop_NoAbsAguVvn_int_2017, 
                   data              = db.peru.sp,
                   control.compute   = list(dic = TRUE, 
                                            waic = TRUE, 
                                            cpo = TRUE),
                   control.predictor = list(link = 1)
                  )

it is worth mentioning that is easy to assum that these province effects are independent using f(area, model = "iid") :

peru.m7   <- inla(n ~ 1 + annus + 
                            f(week,model="rw1")       +
                            f(id.sp, model = "iid", 
                            graph=W.peru)             +
                            temperature  + INEI_prop_aseg_2017 + prop_pobreza_2018 + 
                            INEI_Prop_NoAlumbradoElectri_2017 + INEI_prop_NoAbsAguVvn_int_2017, 
                   data              = db.peru.sp,
                   control.compute   = list(dic = TRUE, 
                                            waic = TRUE, 
                                            cpo = TRUE),
                   control.predictor = list(link = 1)
                  )

On the other hand, we can check our estimates and plot them. So, firt we access to the linear effects fitted using InlaObject$summary.fixed

f.m3<-peru.m3$summary.fixed %>% mutate(Model="linear") %>% rownames_to_column("Variable")
f.m4<-peru.m4$summary.fixed %>% mutate(Model="linearRW(1)") %>% rownames_to_column("Variable")
f.m5<-peru.m5$summary.fixed %>% mutate(Model="AR(1)RW(1)") %>% rownames_to_column("Variable")
f.m6<-peru.m6$summary.fixed %>% mutate(Model="linearRW(1) with spatial effects ")  %>% rownames_to_column("Variable")
f.m7<-peru.m6$summary.fixed %>% mutate(Model="linearRW(1) with iid effects ")  %>% rownames_to_column("Variable")

fix.data<-rbind(f.m3,f.m4,f.m5,f.m6,f.m7)%>% 
          filter(!str_detect(Variable, 'annus'))

and plot the coefficients

        fix.data %>% 
        ggplot(aes(colour=Model))  + 
          geom_linerange(aes(x = Variable, 
                     ymin = `0.025quant`,
                     ymax = `0.975quant`),
                     position = position_dodge(width = 1/2),
                     lwd=1) +
        geom_pointrange(aes(x = Variable, y=mean,
                            ymin = `0.025quant`,
                            ymax = `0.975quant`
                            ),position = position_dodge(width = 1/2), lwd = 1/2,shape=21, fill = "WHITE") + 
        scale_color_manual(values = c("#850503","#f55e2c","#df861d","#ffd16c","#f9ebac"))+
        ggtitle("Fixed effects") +
        geom_hline(yintercept = 0, colour = gray(1/2), lty = 2) +
        coord_flip() +
        theme_bw()

3 4. Model accuracy :

3.1 4.1 Estimated value

we can proceed to analyse our fitted models in the space . so we access to fitted values with InlaObject$$summary.fitted.values$mean

db.peru.sf<-  Peru                     %>% 
              group_by(reg,prov)       %>% 
              summarise()              %>% 
              inner_join(db.peru.sp %>% 
              ungroup()  %>% 
              mutate(fit=peru.m6$summary.fitted.values$mean),c("reg"="reg","prov"="prov")) %>% 
              group_by(reg,prov,year) %>% 
              summarise(n=mean(n),
                         fit=mean(fit))


map.true<-db.peru.sf %>% 
          filter(year==2019) %>% 
          ggplot()+
          geom_sf(aes(fill=n))+
          theme_bw() +
          facet_wrap(vars(year)) + 
          scale_fill_viridis(name="Average number\n of cases",option="rocket",direction = -1)+
          theme(plot.title = element_text(hjust = 0.5,size = "20pts"),
          strip.background =element_rect(fill="#aa3d01"),
          strip.text = element_text(color = "white",face = "bold",size = "5pts")) 
         # scale_fill_distiller(name="Average number\n of cases",direction = -1)


map.fit<-db.peru.sf %>% 
          filter(year==2019) %>% 
          ggplot()+
          geom_sf(aes(fill=fit))+
          theme_bw() +
          facet_wrap(vars(year)) + 
          scale_fill_viridis(name="Average number\n of cases",option="rocket",direction = -1)+
          theme(plot.title = element_text(hjust = 0.5,size = "20pts"),
          strip.background =element_rect(fill="#aa3d01"),
          strip.text = element_text(color = "white",face = "bold",size = "5pts")) 
         # scale_fill_distiller(name="Average number\n of cases",direction = -1)


cowplot::plot_grid(map.true,map.fit)

3.2 4.2 Performance metrics

In order to assess the models, we will calculate in-sample performance metrics.First, we collect the fitted values from the INLA object

fit.m.m3    <- peru.m3$summary.fitted.values$mean
fit.m.m4    <- peru.m4$summary.fitted.values$mean
fit.m.m5    <- peru.m5$summary.fitted.values$mean
fit.m.m6    <- peru.m6$summary.fitted.values$mean
fit.m.m7    <- peru.m7$summary.fitted.values$mean
n.peru      <- db.peru.sp$n

datos  <-  list("AR(1)"=fit.m.m3,"RW(1)"=fit.m.m4,"AR(1)RW(1)"=fit.m.m5,"spatial effects"=fit.m.m6,"iid effects"=fit.m.m7) %>% 
           as.data.frame() %>% 
           melt( variable.name = "modelo",
                 value.name    = "fit") %>% 
           group_by(modelo) %>% 
           mutate(actual    = n.peru,
                  date      = db.peru.sp$date,
                  reg       = db.peru.sp$reg,
                  prov      = db.peru.sp$prov)

then to facilitate the calculation of the performance metrics we transform the data to long format and then we use the yardstick package to calculate the next metrics :mae,mape,mpe,rmse,msd. We can calculate these metrics for all the dataset.

#Metrics to use
perform.metrics <- metric_set(mae,mase,smape,rmse,msd)

# Calculation of metrics in the forecast year
tbl.yrd.full <- datos                    %>% 
                 group_by(modelo)         %>%
                 perform.metrics(truth    = actual, 
                               estimate = fit)

And per province

#Metrics to use
perform.metrics <- metric_set(mae,mase,smape,rmse,msd)

# Calculation of metrics in the forecast year
tbl.yrd.per <- datos                    %>% 
               group_by(modelo,reg,prov)         %>%
               perform.metrics(truth    = actual, 
                           estimate = fit)

Finally, we use gt package to show in a customized table the results

# Results table
tbl.yrd.full                               %>% 
pivot_wider(id_cols     = modelo,
            names_from  = .metric,
            values_from = .estimate)    %>%         
gt()                                    %>%
tab_header(
    title = md("in-sample accuracy metrics")#,
    #subtitle   ="out-of-sample accuracy metrics"
    ) %>% 
data_color(
    columns = vars(mae,mase,smape,rmse,msd),
    colors = scales::col_numeric(
      palette = c(
        "#aa3d01","white"),
      domain = NULL)
  ) %>% tab_footnote(
    footnote = "mae = mean absolute error",
    locations = cells_column_labels(columns = mae)
  ) %>%  tab_footnote(
    footnote = "mase = Mean absolute scaled error",
    locations = cells_column_labels(columns = mase))%>%  
    tab_footnote(
    footnote = "smape = Symmetric mean absolute percentage error",
    locations = cells_column_labels(columns = smape))%>%  
    tab_footnote(
    footnote = "rsme = Root square mean error",
    locations = cells_column_labels(columns = rmse))%>%  
    tab_footnote(
    footnote = "msd = Mean signed deviation",
    locations = cells_column_labels(columns = msd))
in-sample accuracy metrics
modelo mae1 mase2 smape3 rmse4 msd5
AR.1. 8.860025 3.2036267 88.08911 37.886126 1.2015108891
RW.1. 8.850893 3.2003246 88.03417 37.875528 1.2134274324
AR.1.RW.1. 8.850257 3.2000946 88.03376 37.876086 1.2150688025
spatial.effects 2.722575 0.9844346 64.17138 7.235505 -0.0001837196
iid.effects 11.388121 4.1177408 113.19427 38.533291 0.0001608652

1 mae = mean absolute error

2 mase = Mean absolute scaled error

3 smape = Symmetric mean absolute percentage error

4 rsme = Root square mean error

5 msd = Mean signed deviation

And again we can assess this performance spatially, in this case by province . So for example we can plot the spatial distribution of the mean absolute error for the models spatial effects correlated and don’t correlated

tbl.yrd.per.sf<-Peru               %>% 
                group_by(reg,prov) %>%
                summarise()        %>% 
                inner_join(tbl.yrd.per,by=c("reg"="reg","prov"="prov"))


map.sp<-tbl.yrd.per.sf %>% 
        filter(.metric=="mae" & modelo=="spatial.effects") %>% 
        ggplot()+
        geom_sf(aes(fill=.estimate),lwd=0.1) +
                  scale_fill_viridis(name="MAE",option="rocket",direction = -1)+
                  theme(plot.title = element_text(hjust = 0.5,size = "20pts"),
                  strip.background =element_rect(fill="#aa3d01"),
                  strip.text = element_text(color = "white",face = "bold",size = "5pts")) +
        theme_bw()

map.id<-tbl.yrd.per.sf %>% 
        filter(.metric=="mae" & modelo=="iid.effects") %>% 
        ggplot()+
        geom_sf(aes(fill=.estimate),lwd=0.1) +
                  scale_fill_viridis(name="MAE",option="rocket",direction = -1)+
                  theme(plot.title = element_text(hjust = 0.5,size = "20pts"),
                  strip.background =element_rect(fill="#aa3d01"),
                  strip.text = element_text(color = "white",face = "bold",size = "5pts")) +
        theme_bw()

cowplot::plot_grid(map.sp,map.id)

4 Forecasting

In order to predict the number of death in 2019 we need to stablish this year as NA in the dataset so :

db.peru.frct             <- Peru                     %>% 
                            group_by(reg,prov)       %>% 
                            summarise()              %>% 
                            inner_join(db.peru.sp                                   %>%  
                            mutate(n=ifelse(annus == 2019  ,NA,n)),c("reg"="reg","prov"="prov"))

and finally in a similar way as above, we assess the forecasted cases spatially

peru.m8   <- inla(n ~ 1 + f(annus,model = "iid") + 
                            f(week,model="rw1")       +
                            f(id.sp, model = "bym", 
                            graph=W.peru)+
                            temperature+ INEI_prop_aseg_2017 + prop_pobreza_2018 + 
                            INEI_Prop_NoAlumbradoElectri_2017 + INEI_prop_NoAbsAguVvn_int_2017, 
                   data              = db.peru.frct,
                   control.compute   = list(dic = TRUE, 
                                            waic = TRUE, 
                                            cpo = TRUE),
                   control.predictor = list(link = 1)
                  )

db.peru.frct<-db.peru.frct                                        %>% 
              ungroup()                                           %>% 
              mutate(fit=peru.m6$summary.fitted.values$mean)    %>% 
              group_by(reg,prov,year)                             %>% 
              summarise(fit=mean(fit),
                        n=mean(n))



map.true<-db.peru.sf                                     %>% 
          filter(year==2019 ) %>% 
          ggplot()+
          geom_sf(aes(fill=n))+
          theme_bw() +
          facet_wrap(vars(year)) + 
          scale_fill_viridis(name="Average number\n of actual cases",direction = -1,option="rocket")+
          theme(plot.title = element_text(hjust = 0.5,size = "20pts"),
          strip.background =element_rect(fill="#aa3d01"),
          strip.text = element_text(color = "white",face = "bold",size = "5pts")) 


map.frct<-db.peru.frct                                     %>% 
          filter(year==2019  ) %>% 
          ggplot()+
          geom_sf(aes(fill=fit))+
          theme_bw() +
          facet_wrap(vars(year)) + 
          scale_fill_viridis(name="Average number\n of forecasted cases",direction = -1,option="rocket")+
          theme(plot.title = element_text(hjust = 0.5,size = "20pts"),
          strip.background =element_rect(fill="#aa3d01"),
          strip.text = element_text(color = "white",face = "bold",size = "5pts")) 


cowplot::plot_grid(map.true,map.frct)